Skip to content

feat: add support for GetTimestamp, parse_to_date expressions#3467

Open
rafafrdz wants to merge 16 commits intoapache:mainfrom
rafafrdz:feat/add-support-to_date
Open

feat: add support for GetTimestamp, parse_to_date expressions#3467
rafafrdz wants to merge 16 commits intoapache:mainfrom
rafafrdz:feat/add-support-to_date

Conversation

@rafafrdz
Copy link
Contributor

Summary

In order to add support to_date spark sql function, we need first to cover GetTimestamp as an intermediate expression within the query plan when call either ParseToDate (to_date function).

Note to maintainers

Since this is my first contribution to Apache DataFusion Comet, I may have missed some steps in the development or contribution process. Please let me know if anything is missing or needs adjustment.

@rafafrdz rafafrdz force-pushed the feat/add-support-to_date branch from 8033aea to 3c087e1 Compare February 19, 2026 20:56
@rafafrdz rafafrdz force-pushed the feat/add-support-to_date branch from 3460bcb to 1ace0e2 Compare March 3, 2026 22:14
@rafafrdz rafafrdz force-pushed the feat/add-support-to_date branch from 3d20bd5 to 294c749 Compare March 3, 2026 22:42
@rafafrdz rafafrdz requested a review from andygrove March 3, 2026 22:51
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns about this PR.

The micros-to-days conversion in spark_to_date uses (micros / 1_000_000 / 86_400) as i32. Integer division in Rust truncates toward zero, which gives wrong results for pre-epoch dates with non-midnight times. For example, a timestamp like "1969-12-31 12:00:00" would produce micros = -43200000000. Dividing by 1_000_000 gives -43200, then dividing by 86400 gives 0 (day 0 = 1970-01-01), when the correct answer is day -1 (1969-12-31). You would need floor division here, something like micros.div_euclid(1_000_000).div_euclid(86_400) as i32.

The spark_to_chrono format converter only handles a small subset of Spark's SimpleDateFormat patterns (yyyy, MM, dd, HH, mm, ss, fractions, XXX, Z). Patterns like M (single-digit month), d (single-digit day), yy (2-digit year), a (AM/PM), or E (day of week) would pass through unconverted and silently produce wrong results. It might be worth either validating that the format string only contains supported tokens and falling back to Spark for unsupported ones, or documenting the supported subset in getSupportLevel.

Speaking of getSupportLevel, CometParseToDate doesn't override it, so it defaults to Compatible(). It probably needs similar type checks to CometGetTimestamp for TimestampNTZType and other unsupported input types.

rafafrdz and others added 3 commits March 21, 2026 08:24
- Fix pre-epoch date conversion using div_euclid (floor division) so
  timestamps like "1969-12-31 12:00:00" correctly map to day -1 instead
  of day 0
- Add format validation in spark_to_chrono to reject unsupported Spark
  SimpleDateFormat tokens (e.g. single-digit month, 2-digit year, AM/PM)
  instead of silently producing wrong results
- Add getSupportLevel to CometParseToDate with type checks for
  TimestampNTZType and other unsupported input types
- Add Rust unit tests for pre-epoch dates, unsupported formats, and
  format validation
- Add SQL file tests and Scala tests for pre-epoch date handling
@rafafrdz rafafrdz requested a review from andygrove March 21, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: parse_to_date

2 participants